278 PART 5 Looking for Relationships with Correlation and Regression

Working with unequal observation intervals

In this fatal accident example, each of the 12 data points represents the accidents

observed during a one-year interval. But imagine analyzing the frequency of

emergency department visits for patients after being treated for emphysema,

where there is one data point per patient. In that case, the width of the observa-

tion interval may vary from one individual in the data to another. GLM lets you

provide an interval width along with the event count for each individual in the

data. For arcane reasons, many statistical programs refer to this interval-width

variable as the offset.

Accommodating clustered events

The Poisson distribution applies when the observed events are all independent

occurrences. But this assumption isn’t met if events occur in clusters. Suppose

you count individual highway fatalities instead of fatal highway accidents. In that

case, the Poisson distribution doesn’t apply, because one fatal accident may kill

several people. This is what is meant by clustered events.

The standard deviation (SD) of a Poisson distribution is equal to the square root of

the mean of the distribution. But if clustering is present, the SD of the data is

larger than the square root of the mean. This situation is called overdispersion.

GLM in R can correct for overdispersion if you designate the distribution family

quasipoisson rather than poisson, like this:

glm(formula = Accidents ~ Year, family = quasipoisson(link = “log”))

FIGURE 19-5:

Linear and

exponential

trends fitted to

accident data.

© John Wiley & Sons, Inc.